Building a peer-to-peer full-text Web search engine with highly discriminative keys

نویسندگان

  • Karl Aberer
  • Fabius Klemm
  • Toan Luu
  • Ivana Podnar
  • Martin Rajman
چکیده

Web search engines designed on top of peer-to-peer (P2P) overlay networks show promise to enable attractive search scenarios operating at a large scale. However the design of effective indexing techniques for extremely large document collections still raises a number of open technical challenges. Resource sharing, self-organization, and low maintenance costs are favorable properties of P2P overlays in the perspective of large-scale search, but we also face new problems due to potentially huge bandwidth consumption during both indexing and querying, as well as the unavailability of global document collection statistics. Since a straightforward application of P2P solutions for Web search generates unscalable indexing and search traffic, we propose a novel indexing technique which maintains a global key index in structured P2P overlays. Keys are highly-discriminative terms and term sets that appear in a restricted number of collection documents, thus limiting the size of the global index, while ensuring scalable search cost. Our experimental results show reasonable indexing costs while the retrieval quality is comparable to standard centralized solutions with TF-IDF ranking. Our indexing scheme represents a contribution toward realistic P2P Web search engines that opens the opportunity to virtually unlimited resources, well beyond the capacity of today’s best centralized

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Highly Discriminative Keys for Indexing in a Peer-to-Peer Full-Text Retrieval System

Excessive network bandwidth consumption, caused by the transmission of long posting lists, was identified as one of the major bottlenecks for implementing distributed full-text retrieval in a Peer-toPeer (P2P) architecture. To address this problem we introduce a novel approach to indexing using highly discriminative terms and term sets, which leads to short posting lists and therefore reduces t...

متن کامل

Beyond Term Indexing: A P2P Framework for Web Information Retrieval

Web search over peer-to-peer (P2P) networks shows promise to become an alternative to the state-of-the-art search engines since P2P overlays offer means for decentralized search across widely-distributed document collections. However, the design of effective techniques for P2P indexing and retrieval raises a number of technical challenges due to potentially unscalable resource (e.g. bandwidth, ...

متن کامل

SEARCH ENGINE IN LARGE - SCALE PEER - TO - PEER SYSTEMS by AKSHAY LAL

LAL, AKSHAY. Dgoogle: A Full-Text Search Engine in Large-Scale Peer-to-Peer Systems. (Under the direction of Professor Khaled Harfoush). Full-text search engines like Google serve an important role in accessing Internet resources. In such engines, a search for web pages, matching a user’ s query, are typically carried on a set of co-administered, physically co-located clusters of servers. Full-...

متن کامل

Integrating RDF Querying Capabilities into a Distributed Search Infrastructure

The Semantic Web is inherently distributed, and covers both metadata and full-text information. Semantic search therefore can profit a lot from peer-to-peer infrastructures as well as from powerful metadata search functionalities based on full-text search technologies. In this paper we focus on an approach extending an existing P2P search infrastructure with RDF querying capabilities, which bot...

متن کامل

Towards large scale peer-to-peer web search

Web search engines, such as Google and Yahoo, are based on the centralized database model. Search engines using the centralized database model suffer from a several drawbacks, such as: they have a single point of failure, a limited representation of the web, their index is not up-to-date, and scalability. Currently a lot of research is being done on using peer-to-peer (P2P) technology for the u...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005